4 research outputs found
Conditional Generation from Unconditional Diffusion Models using Denoiser Representations
Denoising diffusion models have gained popularity as a generative modeling
technique for producing high-quality and diverse images. Applying these models
to downstream tasks requires conditioning, which can take the form of text,
class labels, or other forms of guidance. However, providing conditioning
information to these models can be challenging, particularly when annotations
are scarce or imprecise. In this paper, we propose adapting pre-trained
unconditional diffusion models to new conditions using the learned internal
representations of the denoiser network. We demonstrate the effectiveness of
our approach on various conditional generation tasks, including
attribute-conditioned generation and mask-conditioned generation. Additionally,
we show that augmenting the Tiny ImageNet training set with synthetic images
generated by our approach improves the classification accuracy of ResNet
baselines by up to 8%. Our approach provides a powerful and flexible way to
adapt diffusion models to new conditions and generate high-quality augmented
data for various conditional generation tasks
S-VolSDF: Sparse Multi-View Stereo Regularization of Neural Implicit Surfaces
Neural rendering of implicit surfaces performs well in 3D vision
applications. However, it requires dense input views as supervision. When only
sparse input images are available, output quality drops significantly due to
the shape-radiance ambiguity problem. We note that this ambiguity can be
constrained when a 3D point is visible in multiple views, as is the case in
multi-view stereo (MVS). We thus propose to regularize neural rendering
optimization with an MVS solution. The use of an MVS probability volume and a
generalized cross entropy loss leads to a noise-tolerant optimization process.
In addition, neural rendering provides global consistency constraints that
guide the MVS depth hypothesis sampling and thus improves MVS performance.
Given only three sparse input views, experiments show that our method not only
outperforms generic neural rendering models by a large margin but also
significantly increases the reconstruction quality of MVS models. Project
webpage: https://hao-yu-wu.github.io/s-volsdf/
PathLDM: Text conditioned Latent Diffusion Model for Histopathology
To achieve high-quality results, diffusion models must be trained on large
datasets. This can be notably prohibitive for models in specialized domains,
such as computational pathology. Conditioning on labeled data is known to help
in data-efficient model training. Therefore, histopathology reports, which are
rich in valuable clinical information, are an ideal choice as guidance for a
histopathology generative model. In this paper, we introduce PathLDM, the first
text-conditioned Latent Diffusion Model tailored for generating high-quality
histopathology images. Leveraging the rich contextual information provided by
pathology text reports, our approach fuses image and textual data to enhance
the generation process. By utilizing GPT's capabilities to distill and
summarize complex text reports, we establish an effective conditioning
mechanism. Through strategic conditioning and necessary architectural
enhancements, we achieved a SoTA FID score of 7.64 for text-to-image generation
on the TCGA-BRCA dataset, significantly outperforming the closest
text-conditioned competitor with FID 30.1
GFlowNet-EM for learning compositional latent variable models
Latent variable models (LVMs) with discrete compositional latents are an
important but challenging setting due to a combinatorially large number of
possible configurations of the latents. A key tradeoff in modeling the
posteriors over latents is between expressivity and tractable optimization. For
algorithms based on expectation-maximization (EM), the E-step is often
intractable without restrictive approximations to the posterior. We propose the
use of GFlowNets, algorithms for sampling from an unnormalized density by
learning a stochastic policy for sequential construction of samples, for this
intractable E-step. By training GFlowNets to sample from the posterior over
latents, we take advantage of their strengths as amortized variational
inference algorithms for complex distributions over discrete structures. Our
approach, GFlowNet-EM, enables the training of expressive LVMs with discrete
compositional latents, as shown by experiments on non-context-free grammar
induction and on images using discrete variational autoencoders (VAEs) without
conditional independence enforced in the encoder.Comment: ICML 2023; code: https://github.com/GFNOrg/GFlowNet-E